14 research outputs found

    FAST: Feature-Aware Student Knowledge Tracing

    Get PDF
    Various kinds of e-learning systems, such as Massively Open Online Courses and intelligent tutoring systems, are now producing amounts of feature-rich data from students solving items at different levels of proficiency over time. To analyze such data, researchers often use Knowledge Tracing [4], a 20-year old method that has become the de-facto standard for inferring student’s knowledge from performance data. Knowledge Tracing uses Hidden Markov Models (HMM) to estimate the latent cognitive state (student’s knowledge) from the student’s performance answering items. Since the original Knowledge Tracing formulation does not allow to model general features, a considerable amount of research has focused on ad-hoc modifications to the Knowledge Tracing algorithm to enable modeling a specific feature of interest. This has led to a plethora of different Knowledge Tracing reformulations for very specific purposes. For example, Pardos et al. [5] proposed a new model to measure the effect of students’ individual characteristics, Beck et al. [2] modified Knowledge Tracing to assess the effect of help in a tutor system, and Xu and Mostow [7] proposed a new model that allows measuring the effect of subskills. These ad hoc models are successful for their own specific purpose, but they do not generalize to arbitrary features. Other student modeling methods which allow more flexible features have been proposed. For example, Performance Factor Analysis [6] uses logistic regression to model arbitrary features, but unfortunately it does not make inferences of whether the student has learned a skill. We present FAST (Feature-Aware Student knowledge Tracing), a novel method that allows general features into Knowledge Tracing. FAST combines Performance Factor Analysis (logistic regression) with Knowledge Tracing, by leveraging on previous work on unsupervised learning with features [3]. Therefore, FAST is able to infer student’s knowledge, like Knowledge Tracing does, while also allowing for arbitrary features, like Performance Factor Analysis does. FAST allows general features into Knowledge Tracing by replacing the generative emission probabilities (often called guess and slip probabilities) with logistic regression [3], so that these probabilities can change with time to infer student’s knowledge. FAST allows arbitrary features to train the logistic regression model and the HMM jointly. Training the parameters simultaneously enables FAST to learn from the features. This differs from using regression to analyze the slip and guess probabilities [1]. To validate our approach, we use data collected from real students interacting with a tutor. We present experimental results comparing FAST with Knowledge Tracing and Performance Factor Analysis. We conduct experiments with our model using features like item difficulty, prior successes and failures of a student for the skill (or multiple skills) associated with the item, according to the formulation of Performance Factor Analysis

    Your model is predictive— but is it useful? Theoretical and Empirical Considerations of a New Paradigm for Adaptive Tutoring Evaluation

    Get PDF
    Classification evaluation metrics are often used to evaluate adaptive tutoring systems— programs that teach and adapt to humans. Unfortunately, it is not clear how intuitive these metrics are for practitioners with little machine learning background. Moreover, our experiments suggest that existing convention for evaluating tutoring systems may lead to suboptimal decisions. We propose the Learner Effort-Outcomes Paradigm (Leopard), a new framework to evaluate adaptive tutoring. We introduce Teal and White, novel automatic metrics that apply Leopard and quantify the amount of effort required to achieve a learning outcome. Our experiments suggest that our metrics are a better alternative for evaluating adaptive tutoring

    General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

    Get PDF
    Knowledge Tracing is the de-facto standard for inferring student knowledge from performance data. Unfortunately, it does not allow modeling the feature-rich data that is now possible to collect in modern digital learning environments. Because of this, many ad hoc Knowledge Tracing variants have been proposed to model a specific feature of interest. For example, variants have studied the effect of students’ individual characteristics, the effect of help in a tutor, and subskills. These ad hoc models are successful for their own specific purpose, but are specified to only model a single specific feature. We present FAST (Feature Aware Student knowledge Tracing), an efficient, novel method that allows integrating general features into Knowledge Tracing. We demonstrate FAST’s flexibility with three examples of feature sets that are relevant to a wide audience. We use features in FAST to model (i) multiple subskill tracing, (ii) a temporal Item Response Model implementation, and (iii) expert knowledge. We present empirical results using data collected from an Intelligent Tutoring System. We report that using features can improve up to 25% in classification performance of the task of predicting student performance. Moreover, for fitting and inferencing, FAST can be 300 times faster than models created in BNT-SM, a toolkit that facilitates the creation of ad hoc Knowledge Tracing variants

    Impact of common cardio-metabolic risk factors on fatal and non-fatal cardiovascular disease in Latin America and the Caribbean: an individual-level pooled analysis of 31 cohort studies

    Get PDF
    Background: Estimates of the burden of cardio-metabolic risk factors in Latin America and the Caribbean (LAC) rely on relative risks (RRs) from non-LAC countries. Whether these RRs apply to LAC remains un- known. Methods: We pooled LAC cohorts. We estimated RRs per unit of exposure to body mass index (BMI), systolic blood pressure (SBP), fasting plasma glucose (FPG), total cholesterol (TC) and non-HDL cholesterol on fatal (31 cohorts, n = 168,287) and non-fatal (13 cohorts, n = 27,554) cardiovascular diseases, adjusting for regression dilution bias. We used these RRs and national data on mean risk factor levels to estimate the number of cardiovascular deaths attributable to non-optimal levels of each risk factor. Results: Our RRs for SBP, FPG and TC were like those observed in cohorts conducted in high-income countries; however, for BMI, our RRs were consistently smaller in people below 75 years of age. Across risk factors, we observed smaller RRs among older ages. Non-optimal SBP was responsible for the largest number of attributable cardiovascular deaths ranging from 38 per 10 0,0 0 0 women and 54 men in Peru, to 261 (Dominica, women) and 282 (Guyana, men). For non-HDL cholesterol, the lowest attributable rate was for women in Peru (21) and men in Guatemala (25), and the largest in men (158) and women (142) from Guyana. Interpretation: RRs for BMI from studies conducted in high-income countries may overestimate disease burden metrics in LAC; conversely, RRs for SBP, FPG and TC from LAC cohorts are similar to those esti- mated from cohorts in high-income countries

    The Leopard Framework: Towards understanding educational technology interventions with a Pareto Efficiency Perspective

    No full text
    Adaptive systems teach and adapt to humans; their promise is to improve education by minimizingthe subset of items presented to students while maximizing student outcomes (Cen et al., 2007). In this context, items are questions, problems, or tasks that can be graded individually. The adaptive tutoring community has tacitly adopted conventions for evaluating tutoring systems (Dhanani et al., 2014) by using classification evaluation metrics that assess the student model component— student models are the subsystems that forecast whether a learner will answer the next item correctly. Unfortunately, it is not clear how intuitive classification metrics are for practitioners with little machine learning background. Moreover, our experiments on real and synthetic data reveal that it is possible to have student models that are very predictive (as measured by traditional classification metrics), yet provide little to no value to the learner. Additionally, when we compare alternative tutoring systems with classification metrics, we discover that they may favor tutoring systems that require higher student effort with no evidence that students are learning more. That is, when comparing two alternative systems, classification metrics may prefer a suboptimal system. We recently proposed Learner Effort-Outcomes Paradigm (Leopard) for automatic evaluation of adaptive tutoring (Gonz®alez-Brenes & Huang, 2015). Leopard extends on prior work on alternatives to classification evaluation metrics (Lee & Brunskill, 2012). At its core, Leopard quantifies both the effort and outcomes of students in adaptive tutoring. Even though these metrics are novel by itself, our contribution is approximating both without a randomized control trial. In this talk, we will describe our recently published results on meta-evaluating Leopard and conventional classification metrics. Additionally, we will present preliminary results of framing the value of an educational intervention as multi-objective programming. We argue that human-propelled machine learning, and educational technology in particular, aim to optimize the Pareto boundary of effort and outcomes of humans

    The White Method: Towards Automatic Evaluation Metrics for Adaptive Tutoring Systems

    No full text
    Human-propelled machine learning systems are often evaluated with randomized control trials. Unfortunately, such trials can become extremely expensive and time consuming to conduct because they may require institutional review board approvals, experimental design by an expert, recruiting (and often payment) of enough participants to achieve statistical power, and data analysis. Alternatively, automatic evaluation metrics offer less expensive and faster comparisons between alternative systems. The fields that have agreed on automatic metrics have seen an accelerated pace of technological progress. For example, the widespread adoption of the Bleu metric (Papineni et al., 2001) in the machine translation community has lowered the cost of development and evaluation of translation systems. At the same time, the low cost of the Bleu metric has enabled machine translation competitions that result in great advances of translation quality. Similarly, the Rouge metric (Lin and Hovy, 2002) has helped the automatic summarization community transition from expensive user studies of human judgments that may take thousands of hours to conduct, to an automatic metric that can be computed very quickly. We study how to evaluate adaptive intelligent tutoring systems, which are systems that teach and adapt to humans. These systems are complex, and are often made up of many components (Almond et al., 2001), such as a student model, content pool and a cognitive model. We focus on evaluating tutoring systems that adapt the items students should solve, which are questions, problems, or tasks that can be graded individually. These adaptive systems optimize the subset of items to be given to the student according to their historical performance (Corbett and Anderson, 1995), or features extracted from their activities (Gonz®alez-Brenes et al., 2014). Adaptive tutoring implies making a trade-off between minimizing the amount of practice a student is assigned and maximizing her learning gains (Cen et al., 2007). Practicing a skill may improve skill proficiency, at the cost of a missed opportunity for teaching new material. Prior work (Pardos and Yudelson, 2013; Pelnek, 2014; Dhanani et al., 2014) has surveyed different evaluation methods for adaptive systems. A tutoring system is usually evaluated by using a classification evaluation metric to assess its student model, or by a randomized control trial. The student model is a component of the tutoring systems that forecasts whether a student will answer the next item correctly. Popular evaluation metrics for student models include classification accuracy, the Area Under the Curve (AUC) of the Receiver Operating Characteristic curve and, strangely for classifiers, the Root Mean Square Error. As a convention, many authors report as a baseline the performance of a majority classifier– even though this classifier is not a student model that can be translated into a teaching policy. Lee and Brunskill (2012) propose a promising evaluation metric that calculates the expected number of practice opportunities that students require to master the content of the curriculum of the tutoring system. Their method is very successful for its purpose but it is limited to a particular student model called Knowledge Tracing. Their approach requires a researcher to derive the theoretical expected behavior for the student model that is to be evaluated, which is not possible to calculate in general. We propose WHole Intelligent Tutoring system Evaluation (White), a novel automatic method that evaluates the recommendations of an adaptive system. White overcomes the limitations of previous work in tutoring system evaluation by using student data, and by allowing to assess arbitrary student models. White relies on counterfactual simulations: it reproduces the decisions that the tutoring system would have made given the input data on the test set. The input of White is (i) a policy that describes when a subset of items of the tutoring system should be presented to the student, and (ii) the student model predictions of the test set. For each student in the test set, White estimates their counterfactual effort – how many items the student would have solved using the tutoring system. White also calculates a counterfactual score (grade) to represent the student learning. The student effort and score act as a proxy of the design goals of a tutoring system – maximizing learning while minimizing student effort. With some soft assumptions on the tutoring system, White can evaluate a large array of tutoring systems with different student models. Our experiments on real and synthetic data reveal that it is possible to have student models that score highly on predictive performance with traditional classification metrics, yet provide little educational value to the learner. Moreover, when we compare alternative tutoring systems with these classification metrics, we discover that they may favor tutoring systems that require higher student effort with no evidence that students are learning more. That is, when comparing two alternative systems, classification metrics may prefer a suboptimal system. Our results add to the growing body of evidence against classification metrics to evaluate tutoring systems (Beck and Xiong, 2013). White is an evaluation method designed to evaluate tutoring systems on student effort and student learning, and provides a better alternative to assessing adaptive systems

    The FAST toolkit for Unsupervised Learning of HMMs with Features

    No full text
    FAST, is an toolkit for adding features to Hidden Markov Models (HMM). It implements a recent variation of the Expectation-Maximization algorithm (Berg-Kirkpatrick et al, 2010) that allows to use logistic regression in unsupervised learning. We demonstrate FAST for predicting future student performance. Our toolkit is up to 300x faster than BNT (a Bayesian Network toolkit), and up to 25% better than conventional HMMs (with no features)

    Anthocyanins and Their Variation in Red Wines I. Monomeric Anthocyanins and Their Color Expression

    No full text
    Originating in the grapes, monomeric anthocyanins in young red wines contribute the majority of color and the supposed beneficial health effects related to their consumption, and as such they are recognized as one of the most important groups of phenolic metabolites in red wines. In recent years, our increasing knowledge of the chemical complexity of the monomeric anthocyanins, their stability, together with the phenomena such as self-association and copigmentation that can stabilize and enhance their color has helped to explain their color representation in red wine making and aging. A series of new enological practices were developed to improve the anthocyanin extraction, as well as their color expression and maintenance. This paper summarizes the most recent advances in the studies of the monomeric anthocyanins in red wines, emphasizing their origin, occurrence, color enhancing effects, their degradation and the effect of various enological practices on them
    corecore